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Abstract 

Nonnegative Matrix Factorization (NMF) is a widely used 
technique in many applications such as face recognition, motion 
segmentation, etc. It approximates the nonnegative data in an 
original high dimensional space with a linear representation in a 
low dimensional space by using the product of two nonnegative 
matrices. In many applications data are often partially corrupted 
with large additive noise. When the positions of noise are known, 
some existing variants of NMF can be applied by treating these 
corrupted entries as missing values. However, the positions are 
often unknown in many real world applications, which prevents 
the usage of traditional NMF or other existing variants of NMF. 
This paper proposes a Robust Nonnegative Matrix Factorization 
(RobustNMF) algorithm that explicitly models the partial cor- 
ruption as large additive noise without requiring the information 
of positions of noise. In practice, large additive noise can be used 
to model outliers. In particular, the proposed method jointly ap- 
proximates the clean data matrix with the product of two non- 
negative matrices and estimates the positions and values of out- 
liers/noise. An efficient iterative optimization algorithm with a 
solid theoretical justification has been proposed to learn the de- 
sired matrix factorization. Experimental results demonstrate the 
advantages of the proposed algorithm. 

1 Introduction 

Nonnegative Matrix Factorization (NMF) has been widely applied in a lot of ap- 
plications such as face recognition [IJ, motion segmentation |i2J, etc. NMF has 
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received substantial attention due to its theoretical interpretation and practical per- 
formance. 

Several variants of NMF have been proposed recently to improve the perfor- 
mance. Sparseness constraints have been incorporated into NMF to obtain sparse 
solutions NMF algorithms in ||5l|6]| are proposed to preserve the local struc- 

ture on the low dimensional manifold(s). To be robust to outliers, Q proposes 
RSNMF, which is based on an outlier resistant objective function. El maintains 
an outlier list in NMF for more robust performance. 




Figure 1 : Large Additive Noise/Partial Corruption/Outlier 

In real applications, data samples are often partially corrupted(e.g, pepper and 
salt noise in images, occlusion on faces). Figure 1 shows some examples of this 
kind of partial corruption. Intuitively, partial corruption can be treated as large 
additive noise. Unfortunately, traditional methods based on least square estimation, 
such as NMF and PC A, are sensitive to this kind of noise ||9l, since the underlying 
assumption of Gaussian noise distribution is not valid. Some recent work ifTOl 
nn [12,1 tries to deal with partial corruption. They usually assume the positions 
of the corruption are given ahead, and then ignore the corresponding data entries. 
However, it is unrealistic to assume that the positions of corruption are known in 
many real world applications. |[T3l proposes Robust PCA to recover the noise 
value and position. 

This paper proposes a Robust Nonnegative Matrix Factorization(RobusfNMF) 
approach, which is able to simultaneously learn the basis matrix, coefficient matrix 
and estimate the positions and values of noise. The underlying observation is that 
the clean data allow a nonnegative factorization and the noise is sparse. An effi- 
cient iterative optimization algorithm with solid theoretical justification has been 
proposed to obtain the desired solution of the RobustNMF approach. To the best of 
our knowledge, our work is the first NMF technique that generates robust results 
for data wit large additive noise(partial corruption) without requiring the informa- 
tion of the positions of the noise. 

The rest part of this paper is organized as follows. Section 2 reviews the tradi- 
tional NMF algorithm. Section 3 proposes the RobustNMF algorithm, followed by 
the iterative optimization method in section 4. Section 5 provides some theoretical 
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justification of the optimization method, and the experimental results are shown in 
section 6. Finally, we conclude and discuss future work. 



2 Review of Nonnegative Matrix Factorization 

Given a nonnegative matrix X G 7^"*^", each column of X represents a data 
sample, the NMF algorithm aims to learn two nonnegative matrices U € TZ"^^^ 
and V G Ji^xn ^^j. approximating X by the product of them, i.e. X « UV. To 
learn the U and V, the following objective function should be minimized: 



o = \\x-uv\\l 

s.t. U>0,V>0 
where 1 1 . 1 1 f denotes the Frobenius norm. 

The following iterative multiplicative updating algorithm is proposed in lfT4l 
to minimize the above objective function: 



yi3=yi h^r^rr\V' (3) 



3 Robust Nonnegative Matrix Factorization 

The proposed Robust Nonnegative Matrix Factorization(RobustNMF) algorithm 
explicitly models the partial corruption, which is treated as large additive noise. 
Let nonnegative matrix X G jirnxn (jg^ote the observed corrupted data, while 
each column of X is a data sample. Let X G denote the clean data without 

pollution. We have X = X + E, where E G Jirnxn jj^g large additive noise. 
Note that the large additive noise E is not Gaussian noise with zero mean, which 
is well handled by least square error minimization. Moreover, we are concerned 
with partial corruption, and partial means the noise distribution is sparse. In other 
words, only a small portion of entries of E are nonzero. For example, in face 
recognition, the occlusion by glasses is an instance of this kind of noise, and it 
covers only a small portion of the entire face. 

The clean data X is approximated by UV{U G 7^™^*^, V G 7^'=^") as in 
traditional NMF, thus we have 

X ^UV + E (4) 
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Considering the above model and the sparseness of the large additive noise E, 
the objective function of RobustNMF is defined as follows: 

ORobustNMF = 11-^ — UV — E\\p 

+ aE[II^.IIo]^ 

j 

The first term is to approximate the clean data; the second term is obtained from 
the sparseness constraint of E. The parameter A controls the tradeoff between the 
two terms, thus it is dependent on how large portion of entries are corrupted. 

However, the Lq norm in the second term makes this objective function difficult 
to optimize, so Li norm is employed to approximate it, which has been a popular 
strategy in prior research lITSll . Substituting the Li norm into the objective function, 
we have 



ORobustNMF = ll-'^ — UV — E\\p 

+ >^Y.^\\E,\\i? 

j 

= \\X-[UJ,-I](^^Er'y)\\l 
j 

where E = EP - EP = = 1^^^, and EP > 0,^" > 0. 

Now we have squared Li norm penalty for sparseness, which has been proved to 
be effective and computationally convenient pi [161 0. Note that E is the sparse 
large additive noise, which could be either negative or nonnegative. We need to 
decompose E into two nonnegative matrices EP and E"- described above to gain 
the nonnegativity which results in the convenience in optimization. We also set 
constraint that X — ii^ > 0, since the clean data should be nonnegative. Finally, the 
objective function should be minimized with respect to U, V, EP, and E^ subject 
to the constraints that C/ > 0, y > 0, EP > 0, E"" > 0, and X - E > 0. 



4 Optimization 

Since ORobustNMF is not convex with U, V, EP, and E^ jointly, it is difficult to 
find the global minimum for ORobustNMF- Instead, we aim to find a local mini- 
mum by iteratively updating U, V, EP and E"- in a similar way with the work |[T4ll 
for NMF. 
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4.1 Update U 

Given V,EP,a.nd we update U to decrease the value of objective function. 



U =arg min | \X - [U, I, -I] ( ) I If 



Aj;[||ii;^.||i + I|i^:;.||i]^ 



(7) 

,ui^:,iii + ii^.iiiij- 

=argmm\\[X-E]-UV\\j, 

The updating rule for U to reduce the objective function is as follows, which 
can be proven in a similar way as in llT4ll . 



U^i = U,, ;---^'l' (8) 



where X = X — E. Note that at this step E is given, and it satisfies the 
constraint that X - E > 0. 



4.2 Update V, Ep, and E'- 



Now we decrease the objective function with respect to V, E^ and E^ given U. 
Lety=Q|.)). 

The updating rule for V is: 



{SV)ij {SV\ 



where X = ),U = J^'^' ^ rr ), and S is defined as 

\Ulxn/ ^OixfeV AeixmV Aeixm' 

S^j = \{U^U)ij\ (10) 

5 Correctness of Updating Rules 

To decrease the objective function with respect to V, E^ and i?". We have: 



V ^,,2 



{V,EP,E'^) = arg min ORobustNMF 
V,EP,E">0 



X 



arq min 

V,EP,E^->0" VOixn 



UJ,-I \ / V 



'EPN ) I If 



arq min IIX — 

y,£;p,_B">o 
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(11) 



where X = )> = (n a^"^' '^/y ). = (r^^ 

^Ulxn/ ^OlxfcV AeixmV Aeixm'' ^(^gn 

Updating and is more involved than updating U, since [/ contains 

some negative values. Now we prove the correctness of the updating rules for V, 
E^ and E^ proposed in section 4. 



5.1 Decrease Objective Function 

Definition 1 [14| Z{v, v') is an auxiliary function for F{^/), if it satisfies the fol- 
lowing conditions 

Z(v,v') >F(v), Z(v,v) = F(v) 

Lemma 1 If Z is an auxiliary function, then F is nonincreasing under the 
update 

v*+-'^ = arg min^ Z{v, v*) 

Now we generalize the Lemma 1 to Lemma 2. 

Lemma 2 If Z is an auxiliary function, then F is nonincreasing as long as v*"*"^ 
satisfies the following condition: 
Z(v*+\v*) < Z(v*,v*) 
Proof: 

F(v*+i) < Z(v*+\v*) < Z(v*,v*) < F(v*) □ 

This generalization from Lemma 1 to Lemma 2 is similar to the generalization 
from EM to Generalized EM. 

In our problem, U contains some negative value. Thus the updating rules in 
lfT4]| do not hold. So we begin to seek new updating rules. 

Define a matrix S as follows. 

Sij = \{U^Uh\ (12) 
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Lemma 3 If K{v^) is the diagonal matrix that 



(13) 



then 



Z{v, V*) =F(v*) + (v - v*)VF(v*) 
1 



(14) 



+ l(v-v*)^i^(v*)(v-v*) 
is an auxihary function for 

i a 

Proof: 

Z{v, v) = -P(v), obviously. Now we prove that Z(y, v*) > F(y). 
Comparing 

F(v) =F(v*) + (v - v*)VF(v*) 

+ ^(v- v*)^(C/^^^)(v- V*) 

to the Z(v, V*), we find that we only need to show 

(v - v*)^K(v*)(v - V*) - (v - v*)^(C/^C/)(v - V*) > 
(v - v*)'^[i^(v*) - U^U]{^ - V*) > 



(15) 



(16) 



(17) 



To prove the positive semidefiniteness, consider the matrix M(v*): 

M„6(v*) = V* [ii:(v*) - U'^UUH (18) 
M is a rescaling of ivr(v*) — U^U. The M is semipositive definite if and only 



if K(v*) - U^Uis. 



ab 

= Y^iSab^lfll - fla^i{U^U)ab^lf^b 
ab 

= Sab'^rlyli^nl + - sgn{{U^U)ab)lJ'al^b] 

ab 

= Y Sab'vivlhfia - sgn{{U^U)ab)fJ-bf 



(19) 



ab 



> 



where 



sgn{x) 



-1: 


X 


< 


0: 


X 


= 


1: 


X 


> 



(20) 



Note in our setting, U contains some negative values, but V is nonnegative. □ 
Substitute Lemma 3 into Lemma 1, the updating rule is: 



V* - K(v*)-iVF(v*) 



(21) 



Writing the components explicitly, we get: 



(22) 



The proposed updating rules can deal with negative values by explicitly con- 
sidering the negative part of large additive noise in the U. If U > 0, the first two 
terms in the above updating rule would cancel each other, resulting in the same rule 
as in [142- 

The V gained by (22) is made up of three parts: V,EP,a.nd E"-. All of them 
should be nonnegative. Unfortunately, the value v*+^ gained by the rule (22) does 
not guarantee the nonnegativity. Now we discuss how to keep it nonnegative while 
updating the values. 
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In the auxiliary function Z, we see that the ii'(v*) is a diagonal matrix. Thus 
the second order terms only involve the form v„. This results in a very important 
property in Lemma 4. 

Lemma4Ifv*+^ = arymiiiv Z'(v, v*) and v* > v'*+^ > v*+^, then Z(v'*+^, v*) < 
Z(v*,v*). 

According to Lemma 2 and Lemma 4, we have F(v'*+^) < F(v*). 

So, to ensure the nonnegativity, we can simply threshold v*+^ by 0. This op- 
eration will introduce the nonnegativity, while keeping the value of F nonincrease 
from V* to the thresholded v*+^. Thus, we have the updating rule described in (9). 

5.2 Convergence Analysis 

Since the objective function has a lower bound, e.g., 0, and the updating rules for 
U , V, and E will all cause the objective function nonincrease, the algorithm always 
converges. 



6 Experimental Results 

This section presents experimental results on two different applications of noise de- 
tection(identifying exact positions of large additive noise), and image reconstruc- 
tion/denoising. 



6.1 Large Additive Noise Detection 

Several algorithms have been proposed to deal with data with partial corruption. 
However, they usually assume the positions of noise are not known in advance. 
Fortunately, the proposed RobustNMF is able to locate the positions. Once the 
missing values are located, existing algorithms can also be applied. This subsection 
presents experiments for detecting the positions of large noise in face images and 
image patches. The reported results are averaged over ten runs. 

The experiments are based on the ORL face dataset. Each face image is of 
size 32x32, thus is represented by a 1024 dimensional vector. For each face in a 
randomly selected subset, 50 pixels are randomly selected and replaced with the 
values of 255 to simulate the large additive noise. The polluted faces make up 
the data matrix X, each column of which corresponds to a polluted face image. 
Then, we apply the RobustNMF algorithm to this X to estimate U, V and E. 
Furthermore, we scan all the entries of the E. When Eij is nonzero, we claim 
that the corresponding pixel is polluted. With the above procedure, we are able 
to detect the positions of noise by analyzing E. The performance is evaluated 
by precision and recall, where Prec^s^on = ^g^r^eto"^^ x 100% and 



9 



100 I — 

90 

80[ r'' 

70 
BO 
SO 

40 / 



■u—a—0'-°" 



Z2 



— □ — Precision 
— i — Recal 



ID 20 30 40 50 60 70 80 90 100 
#Faces 




0.04 0.06 0.08 
h (#Faces=60) 



> -a- 



■ Precision 

■ Recall 



0.04 0.06 D.OB 

(#Faces=100) 



Figure 2: Noise Detection Results in Face Images 

R^^l^ = *i^:t^^:^^' X 100%. Here we only show the performance of 
our algorithm, since no other algorithm, to the best of our knowledge, is designed 
to handle this task. We tried to compare the proposed algorithm with PCA and 
NMF We first applied PCA or NMF to the noisy data X to gain a reconstruction 
X, and then tried to detect the positions of noise by analyzing the difference X—X. 
However, it is very difficult to find an appropriate threshold of the difference and 
the performance is very sensitive to this. We tried several thresholds, all of which 
gave poor results, probably because the partial corruption significantly skews the 
solution of PCA and NMF. In figure 2, the left subfigure presents the precision and 
recall versus different numbers of face images. This algorithm gains a precision of 
over 90%, and a recall of over 50%. The performance increases with the increase of 
the number of image faces. This is reasonable, since more samples means there is 
more information that RobustNMF can explore. When the number is large enough, 
increasing the number does not help to improve the performance any longer. Here 
k is set to 10, and A is set to 0.04. 

The middle and right subfigures in figure 2 investigate the relationship between 
performance and the parameter A. The k is still set to 10, and the number of face 
images is fixed at 50 in middle subfigure, and 100 in right subfigure. Generally 
speaking, the algorithm gains over 90% precision, and over 50% recall. With larger 
values of A, the precision will become a little higher, and the recall a little lower. 
This is consistent with our expectation, since a larger A indicates the detected noise 
is more sparse, which often leads to higher precision and lower recall. 

6.2 Image Reconstruction/Denoise 

This subsection presents the performance of RobustNMF on reconstruction/denoising. 
First, we simulate large additive noise in the same way as in previous subsection. 
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Given the polluted data X, RobustNMF will learn U, V and E. The original im- 
ages are reconstructed as UV. 

As discussed before, some other algorithms, such as WNMF fTOl, are able 
to handle large noise if the positions are given. Thus, we can use RobustNMF 
to locate noise and then employ WNMF to recover the images. The reason for 
combining these two methods is that there is an approximation of Lq norm by Li 
in RobustNMF, which may cause the the absolute value of estimation of E smaller 
than the truth. Let RobustNMF+WNMF denote the new combined method. 

6.2.1 Reconstruction of Faces 

A subset of faces from ORL face dataset are selected, and same large noise is added 
to generate a set of polluted samples denoted as X in a similar way as described 
in Section 6.1, while the original data samples form the matrix X. Mean Squared 
Error(MSRE) is used to measure the reconstruction performance. 

For NMF, matrices U and V are learned based on X, and then are used in 
reconstruction. Compared with the original noise free matrix X, the MSRE is 
calculated as -^| |X — f/V^| ll^, where N is the number of samples. For RobustNMF, 
the MSRE definition is the same as for NMF. For RobustNMF+WNMF, based on 
X, RobustNMF learns U, V, and E. Since E is an indictor for whether a pixel 
is polluted or not. Taking E as a. mask, WNMF learns the new matrix U, V. The 
MSRE is defined as - UVW^. 

Experiments are conducted with varying number of pixels polluted and faces. 
In the first set of experiments,we fix the number of faces to be 50 or 100, and then 
vary the number of pixels polluted, from 10 to 100 with a step of 10. This means 
that there are about 1 percent to 10 percent pixels corrupted in each face. The re- 
sults are shown in top row of figure 3. In the second set of experiments, the number 
of pixels polluted is fixed at 50 or 100, which means about 5 or 10 percent of pixels 
on each face are corrupted. Experiments are conducted with various numbers of 
faces, from 10 to 100 with a step of 10. The results are shown in bottom row of 
figure 3. 

It can be seen from these experiments that both RobutNMF and RobustNMF-i-WNMF 
consistently outperform the traditional NMF with varying number of data samples. 
With the increasing amount of noise, the advantages of proposed algorithms be- 
come even larger. This is because RobustNMF is able to detect the positions of 
the large value noises, i.e. the partial corruption, which enables the application of 
WNMF. Considering the approximation of Lq norm by Li norm, the large noise 
is underestimated, and that is why we prefer Robust-i-WNMF to pure RobustNMF, 
even though both methods outperform the traditional NMF. 
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Figure 3: Face Reconstruction Results - Top: MSRE V.S. # Polluted Pixels in Each 
Face; Bottom: MSRE V.S. # Faces 



6.2.2 Image Denoising(Reconstruction of Patches) 

This subsection presents some experiment results on image denoising by using Ro- 
bustNMF Pepper and salt noise is added to natural images. The noise density is set 
to 5%, which means about 5% of pixels are affected. The noisy image is converted 
into a set of patches, to which RobustNMF is applied. A is set to 0.04 and k is set 
to 10. UV is used to reconstruct the original image. Some denoising results are 
shown in figure 4. The first row shows the generated polluted images, the second 
row shows the denoised results by traditional NMF, and the third row is the results 
by RobustNMF. It can be seen that RobustNMF outperforms traditional NMF. Due 
to the space limit, the ground truth image, the results by RobustNMF+WNMF, and 
more experiments on other images are given in supplemental materials. 



7 Conclusion 

Data in many real world applications are often partially corrupted without the ex- 
plicit information of positions of noise, which prevents the usage of NMF and 
other existing variants. This paper proposes a RobustNMF algorithm for large ad- 
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Figure 4: Image Denoising Results - Top: Polluted Images; Middle: Results by 
NMF; Bottom: Results by RobustNMF 



13 



ditive noise, which can handle partial corruption without requiring the position 
information of noise in advance. The proposed algorithm is able to simultaneously 
locate and estimate the large additive noise and learn the basis matrix U and coef- 
ficient matrix V in the framework of NMF. This proposed algorithm also paves the 
way to apply other variants of NMF(e.g. WNMF) to data with missing values by 
estimating the positions of noise. An efficient optimization algorithm with a solid 
theoretical justification is proposed for RobustNMF. Experimental results on three 
different sets of applications demonstrate the advantages of our algorithm. 

As for future research, we plan to explore a low rank version of RobustNMF, 
which can automatically find the adequate low rank of the decomposed matrices. 
Similar to RobustPCA [13], more applications of RobustNMF can be investigated, 
since NMF is widely used in various areas, including computer vision, text mining, 
speech analysis, and etc. 
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